Characterization of Consistent Global Checkpoints in Large-Scale Distributed Systems

نویسندگان

  • Roberto Baldoni
  • Jerzy Brzezinski
  • Jean-Michel Hélary
  • Achour Mostéfaoui
  • Michel Raynal
چکیده

Backward error recovery is one of the most used schemes to ensure fault-tolerance in distributed systems. It consists, upon the occurrence of a failure, in restoring a distributed computation in an error-free global state from which it can be resumed to produce a correct behaviour. Checkpointing is one of the techniques to pursue the backward error recovery. As we consider large-scale distributed systems, on one side a coordinated approach to take checkpoints is not practicable, on the other side for an uncoordinated approach the probability to have a domino effect during a recovery could be no longer negligible. In this paper, we present a framework that allows first to define formally the domino effect and second to state and prove a theorem to determine if an arbitrary set of checkpoints is consistent. This theorem is very general as it considers a semantic including missing and orphan messages. This plays a key role in designing uncoordinated checkpointing algorithms that require to take as less additional checkpoints as possible in order to ensure domino-free recovery.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Necessary and sufficient conditions for transaction-consistent global checkpoints in a distributed database system

Checkpointing and rollback recovery are well-known techniques for handling failures in distributed systems. The issues related to the design and implementation of efficient checkpointing and recovery techniques for distributed systems have been thoroughly understood. For example, the necessary and sufficient conditions for a set of checkpoints to be part of a consistent global checkpoint has be...

متن کامل

Independent global snapshots in large distributed systems

Distributed systems depend on consistent global snapshots for process recovery and garbage collection activity. We provide exact conditions for an arbitrary checkpoint based on independent dependency tracking within clusters of nodes.. The method permits that nodes (within clusters) can independently compute dependency information based on available ( local ) information. The existing models of...

متن کامل

Transaction-Consistent Global Checkpoints in a Distributed Database System

Checkpointing and rollback recovery are well-known techniques for handling failures in distributed database systems. In this paper, we establish the necessary and sufficient conditions for the checkpoints on a set of data items to be part of a transaction-consistent global checkpoint of the distributed database. This can throw light on designing efficient, non-intrusive checkpointing techniques...

متن کامل

Distributed multi-agent Load Frequency Control for a Large-scale Power System Optimized by Grey Wolf Optimizer

This paper aims to design an optimal distributed multi-agent controller for load frequency control and optimal power flow purposes. The controller parameters are optimized using Grey Wolf Optimization (GWO) algorithm. The designed optimal distributed controller is employed for load frequency control in the IEEE 30-bus test system with six generators. The controller of each generator is consider...

متن کامل

Finding Consistent Global Checkpoints in a Distributed Computation

Finding consistent global checkpoints of a distributed computation is important for analyzing, testing, or verifying properties of these computations. In this paper we present a theoretical foundation for nding consistent global checkpoints. Given an arbitrary set S of local checkpoints, we prove exactly which sets of other local checkpoints can be combined with S to build consistent global che...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995